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Abstract 

Both Marcinkiewicz-Zygmund strong laws of large numbers (MZ-SLLNs) and or- 
dinary strong laws of large numbers (SLLNs) for plug-in estimators of general statis- 
tical functionals are derived. It is used that if a statistical functional is "sufficiently 
regular", then a (MZ-) SLLN for the estimator of the unknown distribution func- 
tion yields a (MZ-) SLLN for the corresponding plug-in estimator. It is in particular 
shown that many L-, V- and risk functionals are "sufficiently regular", and that 
known results on the strong convergence of the empirical process of a-mixing ran- 
dom variables can be improved. The presented approach does not only cover some 
known results but also provides some new strong laws for plug-in estimators of 
particular statistical functionals. 

Keywords: statistical functional, plug-in estimator, Marcinkiewicz-Zygmund strong 
law of large numbers, ordinary strong law of large numbers, empirical process, a-mixing, 
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1 Introduction 



Let F be a class of distribution functions on the real line, and T : F — > V be a statistical 
functional, where (V, || • ||v') is a normed vector space. Let (Xi)^ be a sequence of 
identically distributed real random variables on some probability space (f2, J 7 , P) with 
distribution function F G F. If F n denotes a reasonable estimator for F based on the 
first n observations X±, . . . , X n , then T(F n ) can provide a reasonable estimator for T(F). 
In the context of nonpar ametric statistics, a central question concerns the rate of almost 
sure convergence of the plug-in estimator T(F n ) to T(F). That is, one wonders for which 
exponents r' > the convergence 

n r '\\T(F n )-T(F)\\ v/ — > P-a.s. (1) 

holds, where it is assumed that the left-hand side is ^-measurable for every n G N. 
This article is concerned with the convergence in ([1]) for both r 1 > and r' = and 
general statistical functionals T. In the case r' > the convergence in ([1]) can be seen 
as a Marcinkiewicz-Zygmund strong law of large numbers (MZ-SLLNs), and in the case 
r' = it can be seen as an ordinary strong law of large numbers (SLLNs). 

Let (V, || • ||v) be a normed vector space with V a class of real functions on M, and 
assume that the difference F\ — F2 of every two distribution functions F\ , Fi G F are 
elements of V. So || • ||v can in particular be seen as a metric on F. Assume that F n is 
a F-valued estimator for F based on Xi, . . . ,X n , that ||-F n — F\\\- is ^-measurable for 
every n G N, and that 

n r \\F n - F|| v — ► P-a.s. (2) 

for some r > 0. Finally, let F n := {F n (uj) : uj G 0} be the range of F n , and F be the 
union of the F n , n G N. Then, if for every sequence (F n ) C F with \\F n — F\\y — > we 
have that 

\\T(F n )-T(F)\\ v , = 0(\\F n -Ff v ) (3) 

for some fixed {3 > 0, we obtain by choosing F n := F n (w-wise) that ([1]) holds for r' = r/3. 
If for every sequence (F n ) C F with \\F n — F\\v — > we only have that 

\\T(F n )-T(F)\\ w = o(l), (4) 

then we obtain that ([1]) holds at least for r' = 0; again choose F n := F n (w-wise). That 
is, in order to obtain a MZ-SLLN for T{F n ) it suffices to have a MZ-SLLN for F n and 
to verify ([3]), and in order to obtain a SLLN for T(F n ) it suffices to have a SLLN for F n 
and to verify We refer to (|3|) as Holder- continuity ofT at F w.r.t. (|| • ||v, 1 1 • ||v) 
and F, and to (jU) as continuity ofT at F w.r.t. (|| • ||v, || ■ ||v) and F. 

Concerning F n we will restrict ourselves to the empirical distribution function. That 
is, from now on we assume that F n = ^ Y^i=i ^-[Xi.oo)- I n particular, F will always be 
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contained in the class of all empirical distribution functions - X^=i l[a;i,oo) with « G N 
and x\, . . . ,x n G M. The rest of the article is organized as follows. In Section [2J we 
will first present some results that illustrate ([2]) for uniform and nonuniform sup-norms. 
Thereafter, in Section [3l we will show that several statistical functionals are (H61der-/3) 
continuous w.r.t. uniform or nonuniform sup-norms. The proofs of the results of Section 
[2] will be given in Sections HHHJ 

2 Strong laws for F n 

An intrinsic example for (V, || • ||v) is the normed vector space (B^, || • ||^) of all cadlag 
functions if) with \\ip\\^, < oo, where H^H^ := HV^Iloo refers to the nonuniform sup- norm 
based on some weight function cf). By weight function we mean any continuous function 
cf) : R — > M + which is bounded away from zero, i.e. (/>(■) > e for some e > 0, and u-shaped, 
i.e. nonincreasing on (— oo, x^j and nondecreasing on [x^, oo) for some x^ G M. In Section 
Owe will see that many statistical functionals are (Holder-/?) continuous w.r.t. (|| • ||^, | • |) 
and F. Here we will first present some results that illustrate ([2]) for || • ||v = || ■ ||<£. 

We begin with the case of independent observations. The following result strongly 
relies on [21 Theorem 7.3]. The proof can be found in Section 

Theorem 2.1 Let (JQ) be an i.i.d. sequence of random variables with distribution func- 
tion F. Let (j) be a weight function and r G [0, // <f)(x) 1 ^ 1 ^ r ^dF(x) < oo, then 

n r ||-F n -F||^ — >0 F-a.s. 

Let us now turn to the case of weakly dependent data. We will assume that the 
sequence (Xj) is a-mixing in the sense of [26], i.e. that the mixing coefficient a(n) := 
sup fc>1 sup j4B \F[A fl B] — F[A]F[B}\ converges to zero as n — > oo, where the second 
supremum ranges over all A E a(X\, . . . ,Xf~) and B G a(Xk +n , Xk +n+ i, . . .). For an 
overview on mixing conditions see |12|. [T5] . 

Theorem 2.2 Let (JQ) be a sequence of identically distributed random variables with 
distribution function F. Suppose that (Xi) is a-mixing with mixing coefficients (a(n)). 
Let r G [0, |) and assume that a(n) < Kn~^ for all n G N and some constants K > 
and ■d > 2r. Then 

ri\\F n - F||oo — >0 P-o.«. (5) 

For the proof of Theorem 12.21 which can be found in Section [5J we will combine 
arguments of |23| and [25j . Under the stronger mixing conditions a(n) < Kn~ 8 and 
a(n) < Kn~ (?J+£ \ e > 0, the convergence in ^ is already known from [71 [23] and [57] , 
respectively. If in ^ almost sure converges is replaced by convergence in probability, 
then the result is known from |38] , The more recent article [6] contains a version of 



3 



Theorem 12.21 for empirical processes of so called S'-mixing sequences. The concept of 
S-mixing seems to be less restrictive than the concept of a-mixing, but the two concepts 
are not directly comparable. 

To compare Theorem l2. 21 above with Theorem 1 in [6] anyway, let X t := Y^=Q a s^t-si 
t E N, be a linear process with (Z s ) s£ z a sequence of i.i.d. random variables with ex- 
pectation zero, a finite absolute pth moment for some p > 2, and a Lebesgue density / 
satisfying J \f(x) — f(y) \ dx < M\x — y\ for all x, y E R and some finite constant M > 0. 
For instance, these conditions are fulfilled when Zq is centered normal. If a s = s~ 7 for 
some 7 > (2 +p)/p, then results in [TS] show that (Xt) is a-mixing with a(n) < K n~® 
for i? = (p(7 — 1) — 2)/(l + p). So, if we choose 7 = (3 + 2p) /p, then we have 1? = 1 and 
therefore Theorem 12.21 yields 

n r \\F n - F||oo — >0 P-a.s., Vr G [0,1/2). (6) 

On the other hand, in order to obtain ([6]) with the help of Theorem 1 and the consid- 
erations in Section 3.1 of [BJ, one has to choose 7 = (A + (A + l)p)/p for some A > 4. 
Since (A + (A + l)p)/p > (3 + 2p)/p for every A > 4, Theorem 12.21 above appears to be 
less restrictive in the a-mixing case than Theorem 1 in [6]. On the other hand, Theorem 
1 in {6j covers even the two-parameter empirical process. 

It seems to be hard to modify the arguments of the proof of Theorem [22] in such a way 
that they can be applied to the case of a nonconstant weight function <f>. To the best of 
the author's knowledge, there is no respective results in the literature so far. Results of 
[13j cover the case where in ([5]) the sup-norm is replaced by the L p -norm w.r.t. a cr-finite 
measure for p > 1. However, as the case p = 1 is excluded, the results do not cover the 
L 1 -Wasserstein distance. Notice that several statistical functionals can be shown to be 
continuous w.r.t. the L^Wasserstein distance. 

If one is content with r = 0, i.e. with an ordinary SLLN, then the following Theorem 
12 .31 gives a respective result for nonconstant weight functions 4> and a-mixing data. The 
proof of Theorem 12.31 can be found in Section [6j To the best of the author's knowledge, 
Theorem 12.31 provides the first result on the strong convergence of the empirical distribu- 
tion function F n of a- mixing random variables to the underlying distribution function 
F w.r.t. a nonuniform sup-norm. For any nonincreasing function h : R+ — > [0, 1], we let 
h~*(y) := sup{x G R + : h(x) > y}, y € [0,1], be its right-continuous inverse, with the 
convention sup0 := 0. 

Theorem 2.3 Let (Xj) be a sequence of identically distributed random variables with 
distribution function F. Let cf> be a weight function, and suppose that (pdF < 00. 
Suppose that (Xj) is a-mixing with mixing coefficients (a(n)), let a(t) := a([tj) be the 
cadlag extension of a(-) from N to R+, and assume that 

[ log(l + a^(s/2))G^(s)ds < 00 (7) 
Jo 
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for G := 1 — G, where G denotes the distribution function of <p(Xi). Then 

\\F n - F\\f — >■ F-a.s. (8) 

Remark 2.4 Notice that ((7J) holds in particular if E[</>(Xi) log + <p(Xi)] < oo and a(n) = 
0(n~^) for some arbitrarily small i? > 0; cf. [24, Application 5, p. 924]. O 



3 Strong laws for T(F n ) for particular functionals T 

In this section we will show that several statistical functionals T are continuous w.r.t. 
nonuniform sup-norms || • ||^ or w.r.t. the uniform sup-norm || • H^. As a consequence 
we will obtain MZ-SLLNs and SLLNs for T(F n ), cf. the discussion in the Introduction. 

3.1 L-functionals 

Let K be the distribution function of a probability measure on ([0, 1], £>([0, 1])), and Fk 
be the class of all distribution function F on the real line for which \x\ dK(F(x)) < 
oo. The functional C, defined by 

/oo 
xdK(F(x)), FeF K , (9) 
-oo 

is called L-functional associated with K. It was shown in [8] that if K is continuous 
and piecewise differentiable, the (piecewise) derivative K' is bounded above and F G 
takes the value d £ (0, 1) at most once if K is not differentiable at d, then for every A > 1 
the functional C : — >• R is quasi-Hadamard differentiable at F tangentially to D^ A , 
where <p\( x ) := (1 + \ X \) X - This implies in particular that C is also H61der-1 continuous 
at F w.r.t. (|| • | • |) and F. The assumption that K' be bounded can be relaxed at the 
cost of a more sophisticated choice of the weight function ; cf. the following Lemma 
13.11 In the lemma we will assume without loss of generality that F(x) £ (0, 1) for all 
iGl. If F reaches or 1, then the weight function (/> 7i _p, defined in (|10p below, can be 
modified in the obvious way. 

Lemma 3.1 Let F £ Fk, F := 1 — F , < /?' < 7 < 1, and assume that 

(a) K is locally Lipschitz continuous at x with local Lipschitz constant L(x) > for 
all x G (0,1), and L(x) < C'x~^'(l — x) _/3 ' for all x 6 (0,1) and some constant 
C' > 0. 

(b) /"^ F(xy-P'dx + f™F(xy-Pdx < 00. 
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Assume F(x) E (0, 1) for all x € R, and define the weight function 

^ y , F (x):=if(x)-T r l ( _ OO)0) (x)+F(z)- 7 l [0 ,oo)(a!), * G »• (10) 
TTien i/ie functional C : Fk — > R is Holder-! continuous at F w.r.t. (|| • ||^ F , | • |) and F. 

Proof Since can be written as C(F) = - J° M K(F(x)) dx + f™(l-K(F(x))) dx, 
we obtain by assumption (a) 

/oo 
|K(F n (x))-K(F(x))|dx 
-oo 

/oo 
L(F(x))|(F n -F)(x)|dx 
-oo 

< (C r F(x)-P'F(x)-P' ^^{x)- 1 dx) \\F n - F\ 

J —oo 



for every sequence (F n ) C F; notice that \\F n — F\\a f is finite because of 7 < 1. Since the 
latter integral is finite by assumption (b), we obtain \C(F n ) — C(F)\ = 0(\\F n — F\\^ F ) 
when \\F n -F\\ KF 0. □ 

Remark 3.2 Assumption (a) in Lemma 13. II is fulfilled for every continuous convex dis- 
tribution function K on the unit interval satisfying 1 — K(x) < C(l — x) 13 (for all 
x € [0, 1] and some C > 0) with /3 = 1 — (3' and < ft' < 1. In this case we can choose 
L(x) = C(l — x) _/3 and C = C. For instance, K{x) := 1 — (1 — x)P provides such a 
distribution function when < f3 < 1. O 



Remark 3.3 Lemma [3. II shows that the functional C is Holder- 1 continuous at F when 
K is locally Lipschitz continuous on (0, 1) and at least Holder continuous (of a certain 
order) at and 1. If the kernel K is only piecewise Holder- (5 continuous on [0, 1] for some 
P € (0, 1), and F £ F^ satisfies \\F — l[o,oo)IU 7 < 00 f° r some 7 > 1//3, then one can 
derive at least Holder-/? continuity of C at F w.r.t. (|| ■ |L., | ■ |) and F; cf. [39, Theorem 
2]. O 

Theorem 3.4 Let Xi, X2, . . . be identically distributed random variables with distribu- 
tion function F € F^. Let < f3' < 7 < 1, and assume that conditions (a)-(b) of 
Lemma I3.il hold. 



(i) If the Xi are independent and F satisfies the assumptions of Theorem \2.1\ for 
r G [0, ^) and cf> = 7) ^ defined in (W\). then we have n r \C(F n ) — C(F) \ — > P-a.s. 

(ii) If the sequence (Xi) is a-mixing and satisfies the assumptions of Theorem \2.3\ for 
4> = </> 7 ,f defined in [TU\) , then we have at least \C(F n ) — C(F)\ — > P-a.s. 
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In view of Lemma 13. II and the discussion in the Introduction, assertions (i) and (ii) in 
Theorem [33] are immediate consequences of Theorems l2.ll and l2.3l respectively. Example 
!3.5l below sheds light on the assumptions of Theorem l3.4l Part (i) of Theorem [33] recovers 
results from [HI [20] [35] [39] . Ordinary SLLNs for L-statistics in the fashion of part (ii) 
of Theorem 13.41 can be found in [33J for i.i.d. data, in [5j for 0-mixing data, and in 
[U [5] [T7] for ergodic stationary data. In the case of a-mixing data the conditions in 
[5] [T7] are comparable to those of part (ii) in Theorem 13.41 That is, the statements 
of Theorem 13.41 are basically already known. Nevertheless our approach leads to simple 
proofs once Theorems 12. II and 12.31 are established. In the context of general law-invariant 
risk measures, in Section [3.21 below, we will also take advantage of the method of proof 
of Theorem 13.41 

Example 3.5 Let < /3' < 7 < 1, and assume that condition (a) in Lemma 13 . 1 1 holds . 
Further assume that F(x) = c\\x\~ a for all x < —xo, and F(x) = C2X~ a for all x > xq, 
for some constants a,xo,ci,C2 > 0. In this case, assumption (b) in Lemma 13. II and the 
integrability condition in Theorem 12.11 (with (ft = </> 7i f) read as 

— 1 roc 

\ x \-<*(n-n dx + / x- a ^-P'Ux < 00 (11) 
00 Ji 

and 

-1 /-oo 

\x\^~ ~ dx + / x 1 ^ - dx < 00, (12) 

X> Ji 

respectively. Condition (|lip holds if and only if 7 > /3' + — , and condition (j!2p holds if 
and only if 7 < 1 — r. So, if we assume 0</3' + — <1 — r and < r < ^, then the 
assumptions on K and F imposed in the setting of part (i) of Theorem 13.41 are fulfilled 
(with any 7 S (/?' + — , 1 — r)). In particular, if we assume </?' + -< 1, then the 
assumptions on K and F imposed in the setting of part (ii) of Theorem 13.41 are fulfilled 
(with any 7 € 03' + 1)). O 



In the following theorem we restrict ourselves to empirical quantile estimators based 
on a-mixing data. However, it can easily be extended to plug-in estimators of more 
general L-functionals Ck with dK having compact support strictly within (0, 1). Under 
the stronger mixing conditions a(n) < Ke~ £n , e > 0, and a(n) < Kn~ s the result of 
Theorem l3.6l is basically already known from [3] and [36], respectively. We let H^(x) := 
inf{y € K : H(y) > x}, x € M, denote the left-continuous inverse of any nondecreasing 
function H : M — > M, with the convention inf := 00. 

Theorem 3.6 Let (Xj) be an a-mixing sequence of identically distributed random vari- 
ables with distribution function F. Let r 6 [0, ^), and assume that the mixing coeffi- 
cients satisfy a(n) < Kn~^ for all n E N and some constants K > 0, $ > 2r. Let 
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y G (0,1), and assume that F is differentiable at F*~(y) with F'(F^(y)) > 0. Then, 
n r \F^(y)-F^(y)\^0F-a.s. 

Proof Since F < ~(y) = Ck v {F) with K y = In, i], the proof of Theorem 2 in [39] shows 
that, under the above assumptions on F, P-a.s. there is some constant C > such that 
\F£~(y) — F < ~(y)\ < C \\F n — F\\oo for all n G N. Now the claim follows directly from 
Theorem □ 



3.2 Law-invariant coherent risk measures 

Let p be a law-invariant coherent risk measure on X := £ p (£l, F, P) for some p G [1, oo], 
i.e. p be a mapping from X to 1R being 

• monotone: p(X) < p(Y) for all X, Y G X with X < Y P-almost surely, 

• translation-equivariant: p(X + m) = p(X) + m for all X G X and m G M, 

• subadditive: p(X + Y) < p(X) + p(Y) for all I,7a, 

• positively homogenous: p{\X) = Xp(X) for all X G A" and A > 0. 

Since /> is law- invariant, we may regard it as a functional 1Z on the set F p of all dis- 
tribution functions of random variables in £ P (Q,F, P). If the underlying probability 
space (f2, F, P) is rich enough to support a random variable with continuous distribution 
(which is equivalent to (fi, J 7 , P) being atomless in the sense of [161 Definition A. 26]), 
then the functional 7Z admits the representation 

11(F) = sup C K (F) VFgF p , (13) 

where Ck is the L-functional associated with kernel K (cf. ([9|)) and K-n is a suitable set 
of continuous convex distribution functions on the unit interval. This was shown in |16|. 
Corollary 4.72] for p = oo, and in [22] for the general case. Notice that in [22] the role 
of K is played by g. 

If condition (a) in Lemma [3.11 holds for every K G JC-ji with the same L(x),(3' , C, and 
F G F p satisfies condition (b) in Lemma 13-H then, in view of 

\K(F n )-K(F)\ = sup C K (F n )- sup C K (F) < sup \C K (F n ) - C K (F)\ 

KeK n KeK-K KeKn 

the proof of Lemma 13 . 1 1 shows that the functional 1Z : ¥ p — >• M is Holder- 1 continuous at 
F w.r.t. (|| • W,/) F , | • |) and F. So, in this case assertions (i)-(ii) in Theorem 13.41 also hold 
for 1Z in place of C. This seems to be the first general respective result in the context of 
law-invariant coherent risk measures. 
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Example 3.7 It is easy to show that 



Pp>a (X) := E[X\ + aE[((X-E[X\)+Yf/' 

provides a law-invariant coherent risk measure (called risk measure based on one-sided 
moments) on £ p (f2, J 7 , P) for every p G [1, oo) and a G [0, 1]. It was shown in |2'2l Lemma 
A. 5] that the associated functional 1Z PA : F p — > R is not a L-functional when a > 0. But 
according to our preceding discussion 1Z p ^ a can be represented as in (fT3|) . We clearly 
have 

l-K(x) = £ K (F Bl _ x ) 

< K v>a {F Bl _ x ) 

= (1 - x) + a((l - x)x p ) 1 ' p 

< (l + o)(l -x) 1/p 

(where Fb x _ x is the Bernoulli distribution function with expectation 1 — x) for all x € 
(0, 1) and K G rC-ji . Thus Remark 13.21 and the preceding discussion show that the risk 
functional 1Z PA is H6lder-1 continuous at F G ¥ p w.r.t. (|| • ||^, F , [ • [) and F, provided F 
satisfies condition (b) in Lemma l3. II with /3' = 1 — i. O 



3.3 V-functionals 

Let 5 : R 2 — > R be a measurable function, and ¥ g be the class of all distribution functions 
F on the real line for which \g(x\, X2)\dF {x\)dF {x2) < oo. The functional V, 

defined by 

/OO POO 
/ g(x 1 ,x 2 )dF(x 1 )dF(x 2 ), F G F g , 
-oo J — oo 

is called von Mises- functional (or simply V- functional) associated with g. It was shown 
in [10] that under fairly weak assumptions on g and F GF g the functional V is Holder- 1 
continuous at F w.r.t. (|| • |L, | • |) and F. Thus, from Theorems I2.H4231 one can easily 
derive MZ-SLLNs and SLLNs for V(F n ); see also [ID]. 

MZ-SLLNs for i.i.d. data that can be obtained with the help of Theorem 12.11 are 
already known from [18 } 1271 [3D]. Related ordinary SLLNs can be found in [21] for i.i.d. 
data, in [33] for c/>*-mixing data, in [3] for /3-mixing data, and in [T] for ergodic stationary 
data. The proofs in [34J contain gaps as was revealed in p. 14]. The conditions on 
g, F and the mixing coefficients in J5J Theorem 1] are comparable to those under which 
Theorem 12.31 and Remark 12.41 above yield ordinary SLLNs, but in our setting we can 
consider even a-mixing. The assumptions on the kernel g in [I] are more restrictive than 
the conditions we would have to impose in our setting. On the other hand, ergodicity is 
a weaker assumption than a-mixing. 
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To the best of the author's knowledge, so far MZ-SLLNs for weakly dependent data 
can be found only in |14j . In [T3] the data are assumed to be /3- mixing. In the case 
of a bounded kernel g, Theorem 1 in [14j assumes that the mixing coefficients satisfy 
Ylri=i n fi( n ) < oo in order to obtain a MZ-SLLN for any r 6 [0,1/2). With the help 
of Theorem 12.21 above this condition can be relaxed to a(n) = C?(n _1 ), even in the less 
restrictive case of a-mixing. On the other hand, Theorem 2 in p3] covers also the case 
of unbounded kernels g. 

It was shown in [10] that V-functionals that are degenerate w.r.t. (g, F) are typically 
even H61der-2 continuous at F w.r.t. (|| • ||^, | • |) and F. So the rate of convergence 
of degenerate V-statistics is typically twice the rate of convergence of non-degenerate 
V-statistics; for details see again [10J. 



4 Proof of Theorem 12.11 

By the usual quantile transformation [29, p. 103], we may and do choose a sequence of 
i.i.d. U[0, 1] -random variables, possibly on an extension (Q,F,P) of the original prob- 
ability space (f2, J-, P), such that the corresponding empirical distribution function G n 
satisfies F n = G n (F) P-a.s. Then 

n r sup \G n (F(x)) - F{x)\4>{x) 

n r sup \G n (s) — s\w(s) 
se(o,i) 

with w(s) := 0(max{F^(s); F~*(s)}), where and F^ denote the left- and the right- 
continuous inverse of F, respectively. According to Theorem 7.3(3) in [2], the latter 
bound converges P-a.s. to as n — > oo if and only if f, Q ^ ty(s) 1 ^ 1_r ^(is < oo. Since 
/(o i) w(s) 1 ^ 1 ~ r ^ ds = J R 0(2;) 1 /( 1 ~ r ) dF(x) by a change-of-variable (and the fact that 
F 4 ^ = F^ ds-almost everywhere), and since this integral is finite by assumption, we 
thus obtain n 7 *]]^ — F\\,p — > P-a.s. 



n r \\F n -F\\^ = 
< 



5 Proof of Theorem 2.2 



In this section, we will prove Theorem 12.21 By the usual quantile transformation 
p. 103] (which works also for mixing data) it suffices to prove the result in the special 
case of U[0, l]-distributed random variables. Let (Ui) = (C/t)ieN be an a-mixing sequence 
of identically U[0, 1] -distributed random variables on some probability space (0,,F, P). 
Let I be the identity on [0,1], and G n := ^ Y^=i lp7j,i] ^ e ^ ne empirical distribution 
function of U\ , . . . , U n . 
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Theorem 5.1 Let r G [0, 1/2), C > and f? > 2r. Suppose that the mixing coefficients 
(a(n)) of the sequence (Ui) satisfy a(n) < Cn~^ for all n G N. Then 

nlC^-lUoo — >0 P-a.s. (14) 

The proof of Theorem 15.11 will be carried out in three steps (Sections I5.11[5T3|) . For 

every p G No, g G N and £ € [0, 1], define 

z p,s) ■■= I E (jm^)-*: 

i=p+l 



Thus, in order to verify (|14p . we have to show 

sup Z ,„(t) — »• P-a.s. (15) 

n te(o,i) 

In Sections 15.11 we will collect some elementary properties of Z VA (t). In Section [5.21 we 
will prove some nontrivial properties of Z p ^ q (t). Finally, in Section T5. 31 we will prove (|15p . 

5.1 Auxiliary results, Part I 

Of course, for every p G No and q, u G N with q < u, and every t G (0, 1), the elementary 
inequality 

Zp,u(t) < Zp,q(t) + Z p+ q )U - q (t) (16) 

holds; see also [231 P-333]. Let 



log n 



(17) 



.log2J 

be the largest N G N with 2^ < n. Then n can be represented as 

n = 2 Nn + J^/i i (n)2 J '- 1 (18) 
i=i 

for suitable hj(n) G {0, 1}, j = 1, . . . , iV n . Equation ([18]) and a repeated application of 
(|1~6|) yield that for every n G N and i G (0, 1) 

N n 

Zo, n (t) < Z 0t2 N n (t) + ^ Z 2 N n+bj ( n - )2 j t 23- 1 ( t ) ( 19 ) 

3=1 

holds for suitable integers b,-(n) G {0, . . . , 2 7Vn ~- ? - 1}, j G {1, . . . , N n }. 
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5.2 Auxiliary results, Part II 



Lemma 15.31 below will be crucial for the central part of the proof of Theorem 15.11 (cf. 
Section 15. 3p . For the proof of Lemma 15.31 we will need the following lemma, which in 
turn is an immediate consequence of Proposition 7.1 in and Markov's inequality. 



Lemma 5.2 For all p G No, q G N and x > 0, 



<T 1/2 sup Z M (t)>x < ^(l + 4Va(i))(2 + logg) 2 . 

te(o,i) x \ i 



(20) 



i=0 



Now, let R > r be sufficiently close to r (to be concretized later on) and /3 > be 
sufficiently close to zero (to be concretized later on). For every iV G N, define the event 

F N := { sup Z 02N (t) > 2 N ^\. 

For every N 6 N, j G {1, . . . , N} and b G {0, . . . , 2 N ~ j - 1} define the event 
H N (j,b) := { sup Z 2N+b2i ^(t) > 2 N ^2-^ N -^\. 

Moreover, for every N G N define the event 

N 2 N -i-l 

H N := |J |J H N (j,b). 

j=l 6=0 

Lemma 5.3 P[ hm sup jv->oo ^n] = ^[ nmsu P7V-s.oo Hn] = 0. In particular, F-a.s. there 
are some constants Ki,K 2 > such that 



for all N G N, and 



sup Z 02 N{t) < K 1 2 N ( 1 ~ R '> 

*G(0,1) 



sup Z^+M^-xCt) < K 2 2 iV ( 1 - fi )2-^ iV ^) 
te(o,i) 



for all N G N, j G {1, . . . ,N) and b G {0, ... , 2 2V- J - 1}. 
Proof By Lemma 15.21 and the assumption a(i) < C i~® , 

F[F N ] = 



< 



2- N ' 2 sup Z 02N (t) > 2 N ^ 2 -V 
te(o,i) 

2^-1 

'l + 4 ^ C7*-*)(2 + log2 

j=0 



2 JV(l-2i?) 



7V\2 
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for some finite constant K > 0, where we assumed without loss of generality that $ £ 
(0, 1). Choosing R sufficiently close to r, and taking the assumption i? > 2r into account, 
we obtain X}jv=i ^t-^V] < 00 • Now the Borel-Cantelli lemma yields P[lim sup N _^. oa Fn] = 
0. 

Again by Lemma 15.21 and the assumption a(i) < C , 

F[HnUM = P [2-Ci-D/a gup ^ +fe2J2J _ l( i) > 2 -0--i)/2 2 Jv(i-iD 2 -^- J -)- 
L *e(o,i) 

£ ^Ww)( 1 + 4, S cr")(2 + iog»-'f 

i=0 

= if 2 i(2 ~ 2/3-,?:) 2- Ar ( 2 ~ 2jR - 2 ^) j 2 

for some finite constant K > 0, where we again assumed without loss of generality that 
•& G (0,1). Therefore, 

AT 2^-^-1 

P[#jv] < i^ 2 - iV ( 2 - 2i? - 2/3 )^ 2^ 2 ~ 2 ^j 2 

j=l b=0 
N 

< K 2- N{ - 2 - 2R - 2 ^ 2 JV-J ' 2-?( 2 - 2/3 -^ j 2 

i=i 

< ^'2- Ar ( 1 - 2/? - 2 / 3 )2 Ar(1_/3_,?) 
= K' 2- N ^- 2R -^ 

for some finite constant K ' > 0. Choosing i? sufficiently close to r, choosing /3 sufficiently 
close to zero, and taking the assumption ■& > 2r into account, we obtain ^jv=i Pf-^v] < 
oo. Now the Borel-Cantelli lemma yields P[limsupN->.oo ^n] = 0. □ 



5.3 Completion of the proof of Theorem 15.11 

We now prove (|15p . By (|19p and the definition of -/V n as the largest N € No with 2^ < n 
(cf. (fT7|) ). we have 

1 1 1 

-3— SUp Z , n (t) < —r— SUp Z ,2^n(*) + —r— V SUp Z 2 at„ +6 . (n)2J : .jj-l (t) 

n te(o,i) ra *e(o,i) n ^te(0,i) 

= : ^n,l + ^n,2 

for suitable bj(n) € {0, . . . ,2 Nn ~i — 1}. In the sequel we will show that I n> \ and I n ^ 
converge to zero P-a.s. This will complete the proof of Theorem 15.11 

As for J n i 3 we observe that by Lemma [5.31 there is P-a.s. a constant K\ > such that 
< n r ~ l Ki2 Nn ^-~ R ^ = ifi n r ~ R for all n £ N. Since R > r, the summand thus 
converges to zero P-a.s. 
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As for I n 2, we observe that by Lemma 15.31 there is P-a.s. a constant Ki > such that 

3=1 

N n -1 
3=0 

holds for all n G N. Since R > r, the summand l n i thus converges to zero P-a.s. This 
completes the proof of Theorem 12.21 



6 Proof of Theorem 12.3 



Without loss of generality we assume xa, = 0. So <j> can be seen as a nonincreasing 
function on [— oo,0]. We will only show that 

sup \F n (x) - F(x)\<f>{x) — ► P-a.s. 

x£( — oo,0] 

The analogous result for the positive real line can be shown in the same way. We will 
proceed in three steps, where we will combine arguments of [31]-[32] (Steps 1-2) with 
Rio's SLLN for a- mixing data (Step 3). The latter can be found in \24\ Theorem 1 (ii)] 
and will be recalled in the following theorem. As before, the rightcontinuous inverse hr* 
of any nonincreasing function h : M+ — > [0, 1] will be defined by hr t {y) := sup{x £ R+ : 
h{x) > y}, y G [0, 1], with the convention sup0 := 0. 

Theorem 6.1 (Rio) Let £i,£2>--- be identically distributed random variables on some 
probability space (f2, J-, P) with E[|^i|] < oo. Suppose that (£j) is a-mixing with mixing 
coefficients (a(n)), and let a{y) := a(LyJ) be the cadlag extension of a(-) from N to R+. 
Let G be the distribution function of and set G := 1 — G. If 

J log(l + a^{y/2)^G^(y)dy < oo, (21) 

then ^ =1 (^ -E[^}) ^ F-a.s. 

Step 1. Let L l {dl) be the space of all Lebesgue integrable functions on [0,1], and 
[I, u] := {/ G L 1 (dT) : / < / < u} be the bracket of two functions I, u G L 1 (dl) with I < u 
pointwise. For any e > 0, a bracket [l,u] is called e-bracket if Jq(u — I) dl < e. Set 

w(t) := ^(^(t)) 1 [0 ,F(0)](<). t€[0,l). 

Since our assumption (f> dF < oo implies Jq 1 w dl < oo, we can find as in |31|. Example 
19.12] a finite partition = tg < if < • • • < t e mc = 1 of [0, 1] such that [If , uf] with 

/f(-) := wC^l^jC-) 

<(•) := w(<f_i) l[0,*ti] (0 + «»(•) , t |](0 
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(i = l,...,m £ ) are e-brackets in L 1 (dl) covering the class £ w := {w s : s G [0,1]} of 
functions 

w s (-) := w(s)t [0;S] (-). 

Step 2. By the usual quantile transformation, we can find a sequence of U[0, l]-random 
variables U\, U2, ■ ■ • (possibly on an extension (fl, F, P) of the original probability space 
(fi, F, P)) such that the sequence (Ui) has the same mixing coefficients (under P) as the 
sequence (Xj) under P and such that the corresponding empirical distribution function 
G n satisfies F n = G n o F P-a.s. Here we will show as in the proof of Theorem 2.4.1 in 
that 



sup\F n {x) - F{x)\4>{x) < max max { f uf d(G n - I) ; f If d(I - G n )\ + e (22) 
x<0 i=l,...,m e I J J ) 



for every e > 0. Since 



sup|F n (x) -F(x)\<f,(x) = swp\G n (F(x))-F(x)\</>(x) 

x<0 x<0 

< sup \G n (s) — s\ w(s) 
se(o,i) 

-1 rl 



sup 

se(o,i) 



w q dG„ — Wedl 



for (|22p it suffices to show that 



sup 

sG(0,l) 



1 - f 1 
w s dG n — I w s dl 

Jo 



< max maxf f uf d(G n - I) ; f If d(I - G n )\ + e. (23) 
i=l,...,m e I J Jq ) 

To prove ([23D, we note that for every s £ [0, 1] there is some z s £ {1, . . . , m e } such that 
«; s £ [Z? ]> cf. Step 1. Therefore, since [If ,uf] is an e-bracket, 



u> s dG n — / w s dl < / uf s dG n — w s dl 

1 /•! 



<d(G n -I)+ / (uf s -w s )dl 
Jo 

< I u f s d(G n -I)+ [ (uf s -lf s )dl 
Jo Jo 

< max / uf d(G n - I) + e. 
i=l,...,m e J Q 



Analogously we obtain 



f w s dG n - [ w s dl > -( max f If d(I - G n ) + e). 
Jo Jo \i=l,...,m e J / 
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That is, (|22D holds true. 

Step 3. Because of (f22|) . for (|SJ) to be true it suffices to show that both l\ d(I 
G n ) and Q u\ d(G n — I) converge P-a.s. to zero for every i 
convergence follows from the representation 



1, . . . , m £ . The second 



u\d{G n -l) 



3=1 



%[«;(Ci)l[o,tf_ 1 ](tf 
E w w(l/i)l(tj_ 1 ,t{](^i 



and Theorem 16.11 noting that (0) implies (|2Tj) for both £j := w(t^_i) l[Q tt e ](Uj) and 
£j := w(Uj)l(i* ,tf](Uj). The verification of the first convergence is even easier. This 
completes the proof of Theorem 12.31 
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